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Abstract We investigate projected scaled gradient (PSG) methods for con¬ 
vex minimization problems. These methods perform a descent step along a 
diagonally scaled gradient direction followed by a feasibility regaining step via 
orthogonal projection onto the constraint set. This constitutes a generalized 
algorithmic structure that encompasses as special cases the gradient projection 
method, the projected Newton method, the projected Landweber-type meth¬ 
ods and the generalized Expectation-Maximization (EM)-type methods. We 
prove the convergence of the PSG methods in the presence of bounded pertur¬ 
bations. This resilience to bounded perturbations is relevant to the ability to 
apply the recently developed superiorization methodology to PSG methods, 
in particular to the EM algorithm. 


1 Introduction 

In this paper we consider convex minimization problems of the form 

J minimize J{x) 

( subject to a; £ 17. ' 

The constraint set 17 C M" is assumed to be nonempty, closed and convex, 
and the objective function J : 17 i—> M is convex. Many problems in engineer¬ 
ing and technology can be modeled by CD- Gradient-type iterative methods 
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are advocated techniques for such problems and there exists an extensive lit¬ 
erature regarding projected gradient or subgradient methods as well as their 
incremental variants, see, e.g., [6l [30llMll4^[54] . 

In particular, the weighted Least-Squares (LS) and the Kullback-Leibler 
(KL) distance (also known as /-divergence or cross-entropy [211] 1. which are two 
special instances of the Bregman distances [M p. 33], are generally adopted 
as proximity functions measuring the constraints-compatibility in the field of 
image reconstruction from projections [iiiniiiiiisg. Minimization of the LS or 
the KL distance with additional constraints, such as nonnegativity, naturally 
falls within the scope of ©• Correspondingly, the Landweber iteration [52] 
is a general gradient method for weighted LS problems m Section 6.2], [T21 
Section 4.6], [36], [47], [53], while the class of expectation-maximization (EM) 
algorithms m are essentially scaled gradient methods for the minimization 
of KL distance [sjunmu]. 

Motivated by the scaled gradient formulation of EM-type algorithms, we 
focus our attention on the family of projected scaled gradient (PSG) methods, 
the basic iterative step of which is given by 


:= Paix'^ - TkD{x'^)^J{x^)), ( 2 ) 


where Tk denotes the stepsize, D{x^) is a diagonal scaling matrix and Pq is the 
orthogonal (Euclidean least distance) projection onto I?. To our knowledge, 
the PSG methods presented here date back to [U Eq. (29)] and they resemble 
the projected Newton method studied in [5]. 

From the algorithmic structural point of view, the family of PSG methods 
includes, but is not limited to, the Goldstein-Levitin-Polyak gradient projec¬ 
tion method the projected Newton method [5], and the projected 

Landweber method [2] Section 6.2], [53], as well as generalized EM-type meth¬ 
ods [mio]. The PSG methods should be distinguished from the scaled gradient 
projection (SGP) methods in the literature [51[7]. PSG methods belong to the 
class of two-metric projection methods |25] . which adopt different norms for 
the computation of the descent direction and the projection operation while 
SGP methods utilize the same norm for both. 

The main purpose of this paper is to investigate the convergence behavior 
of PSG methods and their bounded perturbation resilience. This is inspired 
by the recently developed superiorization methodology (SM) [T51[T51[55] . The 
superiorization methodology works by taking an iterative algorithm, investi¬ 
gating its perturbation resilience, and then, using proactively such permitted 
perturbations, forcing the perturbed algorithm to do something useful in addi¬ 
tion to what it is originally designed to do. The original unperturbed algorithm 
is called the “Basic Algorithm” and the perturbed algorithm is called the “Su- 
periorized Version of the Basic Algorithm”. 
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If the original algorithnj^ is computationally efficient and useful in terms of 
the application at hand, and if the perturbations are simple and not expensive 
to calculate, then the advantage of this methodology is that, for essentially 
the same computational cost of the original Basic Algorithm, we are able to 
get something more by steering its iterates according to the perturbations. 

This is a very general principle, which has been successfully used in some 
important practical applications and awaits to be implemented and tested 
in additional fields; see, e.g., the recent papers [smss], for applications in 
intensity-modulated radiation therapy and in nondestructive testing. The prin¬ 
ciples of superiorization and perturbation resilience along with many references 
to works in which they were used, are reviewed in the recent m and EH- A 
chronologically ordered bibliography of scientific publications on the superi¬ 
orization methodology and perturbation resilience of algorithms has recently 
been compiled and is being continuously updated by the second author. It is 
now available at: http://math.haifa.ac.il/yair/bib-superiorization-censor.htmI 

In a nutshell, the SM lies between feasibility-seeking and constrained min¬ 
imization. It is not quite trying to solve the full-fledged constrained minimiza¬ 
tion; rather, the task is to seek a superior feasible solution in terms of the 
given objective function. This can be beneficial for cases when an exact ap¬ 
proach to constrained minimization has not yet been discovered, or when exact 
approaches are computer resources demanding or computation time consum¬ 
ing. In such cases, existing feasibility-seeking algorithms that are perturbation 
resilient can be turned into efficient algorithms that perform superiorization. 

The basic idea of the SM originates from the discovery that some feasibility¬ 
seeking projection algorithms for convex feasibility problems are bounded 
perturbations resilient [5]. SM thus takes advantage of the perturbation re¬ 
silience property of the String-Averaging Projections (SAP) [T7] or Block- 
Iterative Projections (BIP) methods to steer the iterates of the original 

feasibility-seeking projection method towards a reduced, but not necessarily 
minimal, value of the given objective function of the constrained minimization 
problem at hand, see, e.g., msM- 

The mathematical principles of the SM over general consistent “problem 
structures” with the notion of bounded perturbation resilience were formu¬ 
lated in [14]. The framework of the SM was extended to the inconsistent case 
by using the notion of strong perturbation resilience [33]. Most recently, the 
effectiveness of the SM was demonstrated by a performance comparison with 
the projected subgradient method for constrained minimization problems m- 

But the SM is not limited to handling just feasibility-seeking algorithms. 
It can take any “Basic Algorithm” that is bonnded perturbations resilient 
and introduce certain permitted perturbations into its iterates, such that the 
resulting algorithm is automatically steered to produce an output that is su- 


^ We use the term “algorithm” for the iterative processes discussed here, even for those 
that do not include any termination criterion. This does not create any ambiguity because 
whether we consider an infinite iterative process or an algorithm with a termination rule is 
always clear from the context. 
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perior with respect to the given objective function. See Subsection 14.11 below 
for more details on this point. 

Specifically, efforts have been recently made to derive a superiorized ver¬ 
sion of the EM algorithm, and this is why we study the bounded perturbation 
resilience of the PSG methods here. Superiorization of the EM algorithm was 
first reported experimentally in our previous work with application to biolu¬ 
minescence tomography m- Such superiorized version of the EM iteration 
was later applied to single photon emission computed tomography [43]. The 
effectiveness of superiorization of the EM algorithm was further validated with 
a study using statistical hypothesis testing in the context of positron emission 
tomography |5B]. 

These efforts with regard to the EM algorithm prompted our research 
reported here. Namely, the need to secure bounded perturbations resilience of 
the EM algorithm that will justify the use of a superiorized version of it to 
seek total variation (TV) reduced values of the image vector x in an image 
reconstruction problem that employs an EM algorithm, see Section 01 below. 

The fact that the algebraic reconstruction technique (ART), see, e.g., [35] 
Chapter 11] and references therein, is related to the Landweber iteration [531 
l60] for weighted LS problems and the fact that EM is essentially a scaled 
gradient method for KL minimization [51 129100] prompt us to investigate the 
PSG methods, which encompass both, with bounded perturbations. 

So, in view of the above considerations, we ask if the convergence of PSG 
methods will be preserved in the presence of bounded perturbations? In this 
study, we provide an affirmative answer to this question. First we prove the 
convergence of the iterates generated by 

:= Pnix'^ - TkD{x'^)WJ{x^) + e(x'=)), (3) 

with {e(x^)}^Q denoting the sequence of outer perturbations and satisfying 

OO 

^||e(x'=)|l<+oo. (4) 

k=0 

This convergence result is then translated to the desired bounded perturbation 
resilience of PSG methods (in Section 0] below). 

The algorithmic structure of is adapted from the general frame¬ 

work of the feasible descent methods studied in 03 . Compared with 03 , our 
algorithmic extension has two aspects. Firstly, the diagonally scaled gradient 
is incorporated, which allows to include additional cases such as generalized 
EM-type methods. Secondly, the perturbations in 03 were given as 

||e( 2 ;^)|| < 7 ||a;^ — for some 7 > 0, Vfc, (5) 

so as not to deviate too much from gradient projection methods, while in our 
case the perturbations are assumed to be just bounded. 

Bounded perturbations as in ® were previously studied in the context of 
inexact matrix splitting algorithms for the symmetric monotone linear com¬ 
plementarity problem 03 . This was further investigated in m under milder 
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assumptions by extending the proof of m- Additionally, convergence of the 
feasible descent method with nonvanishing perturbations and its generaliza¬ 
tion to incremental subgradient-type methods were also reported in [58) and 
[55] . respectively. 

The paper is organized as follows. In Section [21 we introduce the PSG 
methods by studying two particular cases of the proximity function minimiza¬ 
tion problems for image reconstruction. In Section |3l we present our main 
convergence results for the PSG method with bounded perturbations, namely, 
the convergence of ®-0- We call the latter “outer perturbations” because of 
the location of the term e(a:^) in (I3|). In Section [H we prove the bounded per¬ 
turbation resilience of the PSG method by establishing a relationship between 
the inner perturbations and the outer perturbations. 


2 Projected Scaled Gradient Methods 

In this section, we introduce the background and motivation of the projected 
scaled gradient (PSG) methods for (|T|). As mentioned before, the PSG methods 
generate iterates according to the formula 

= Paix’^ - TkD{x’^)yj{x^)), fc = 0,1,2,... (6) 

where is a sequence of positive stepsizes and {D{x^)}^^f^ is a sequence 

of diagonal scaling matrices. The diagonal scaling matrices not only play the 
role of preconditioning the gradient direction, but also induce a general algo¬ 
rithmic structure that encompasses many existing algorithms as special cases. 

In particular, the PSG methods include the gradient projection method [H 
nznn], which corresponds to the situation when D{x^) = In for any k with 
In the identity matrix of order n. In case when D(x^) k, J{x^)~^, namely 
when the diagonal scaling matrix is an adequate approximation of the inverse 
Hessian, the PSG method reduces to the projected Newton method [5]. In fact, 
the selection of various diagonal scaling matrices give rise to different concrete 
algorithms. How to choose appropriate diagonal scaling matrices depends on 
the particular problem. 

We investigate the class of projected scaled gradient (PSG) methods by 
concentrating on two particular cases of CD- Consider the following linear 
image reconstruction problem model with nonnegativity constraint. 

Ax = b, X > 0, (7) 


where A = (a])™’"]^ is an m x n matrix in which a* = G K" is the ith 

column of its transpose A'^, and x = (ajj)"^^ £ b = (bi)'^^ € R™ are 

all assumed to be nonnegative. For simplicity, we denote I?o := R+ hereafter. 
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2.1 Projected Landweber-type Methods 

The linear problem model © can be approached as the following constrained 
weighted Least-Squares (LS) problem, 

f minimize Jhsix) . . 

subject to a; € l7o, ' 

where the weighted LS functional Jhsix) is defined by 

-^Ls(a;) := i ||6- ^a;||^ = i {W{b - Ax),b - Ax), (9) 

with W the weighting matrix depending on the specific problem. The gradient 
of JLs(a;) for any a; € R" is 

V JLs(a;) = -A^W{b - Ax). (10) 

The projected Landweber method [H Section 6.2] for ([8]) uses the iteration 

x'^+^ =Pn,{x’^+TkA^W{b-Ax'^)). (11) 

By (lion . the above m can be written as 

= Pa,{x^ - TkVJLsix’^)), ( 12 ) 

which obviously belongs to the family of PSG methods for ([8|) with the diagonal 
scaling matrix D(x^) = In for any k. 

The projected Landweber method with diagonal preconditioning for ([5]). 
as studied in [^, uses the iteration 

x’^+^ = Pao{x’^+nVA'^W{b-Ax’^)), (13) 

where P is a diagonal n x n matrix satisfying certain conditions, see [531 p. 
446, (i)-(iii)]. By (ITOll . (fT^ is equivalent to the iteration 

x’^+^ = Pooix’^ - rfePVJLs(x'=)), (14) 

and hence, it also belongs to the family of PSG methods with D{x^) = V for 
any k. 

In general, the projected Landweber-type methods for (|8]) is given by 

x^+^ = PnAx'^ - Tfcl?LsVJLs(x'=)), (15) 

where the diagonal scaling matrices are typically constant positive definite 
matrices of the form, 

L>ls := diag | —I , € M and > 0, for all j = 1, 2,..., n, (16) 

with Sj possibly constructed from the linear system matrix ^ of ([T]) for each 
j, and being sparsity pattern oriented m Eq. (2.2)]. 
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2.2 Generalized EM-type Methods 

The Kullback-Leibler distance is a widely adopted proximity function in the 
field of image reconstruction. Using it, we seek a solution of o by minimizing 
the Kullback-Leibler distance between b and Ax, as given by 


{a\x) 

over nonnegativity constraints, i.e., 

J minimize JKL(a;) 
t subject to X G l7o- 



Jki.{x) := KL{h, Ax) = X] 

n = ^ V 


The gradient of JKL(a:^) is 


VJKL(a:)=^ 1-^ 


2=1 


{a\x) 


a . 


(19) 


The class of EM-type algorithms is known to be closely related to KL 
minimization. The fcth iterative step of the EM algorithm in K." is given by 




■^3 




for all j = 1, 2,..., n. 


( 20 ) 


The following convergence results of the EM algorithm are well-known. For 
any positive initial point x° G any sequence {x^'}^q, generated by (1^ . 

converges to a solution of © in the consistent case, while it converges to the 
minimizer of the Kullback-Leibler distance YAj{b,Ax), defined by (flTl) . in the 
inconsistent case [51] . 

It is known that the EM algorithm can be viewed as the following scaled 
gradient method, see, e.g., HIMIIIS], whose fcth iterative step is 

_ i5EM(x'=)VJKL(x'=), (21) 

where the n x n diagonal scaling matrix is defined by 

Dem{x) := diag < ^ \ . (22) 

Thus the EM algorithm belongs to the class of PSG methods with = 1 for 
all k and the diagonal scaling matrix given by D{x) = Dem{x) for any x. 

More generally, generalized EM-type methods for dl) can be given by 


x'^+^ = Pno{x^ - rfcZ?KL(x'=)VJKL(x'=)) 


( 23 ) 
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with {rfcl^g as relaxation parameters [IHl Section 5.1] and {nKL(a;^)}^Q as 
diagonal scaling matrices. The diagonal scaling matrices for the generalized 
EM-type methods are typically of the form, see, e.g., [29], 

Dki.{x) := diag |^ ^ ® > 0 for j = 1,2,..., n, (24) 

where Sj might be dependent on the linear system matrix A of 0 for any j. 
When Sj = Yl'iLi for any then Z1 kl(®) coincides with the matrix Hem (a;) 
given by (1^ . 

It is worthwhile to comment here that it is natural to obtain incremental 
versions of PSG methods when the objective function J{x) is separable, i.e., 
J{^) = Ji{x) for some integer m. The separability of both the weighted 

LS functional ® and the KL functional (EZl) facilitates the derivation of in¬ 
cremental variants for the projected Landweber-type methods and generalized 
EM-type methods. While the incremental methods enjoy better convergence 
at early iterations, relaxation strategies are required to guarantee asymptotic 
acceleration [50] . 


3 Convergence of the PSG Method with Outer Perturbations 

In this section, we present our main convergence results of the PSG method 
with bounded outer perturbations of the form 0-0. The stationary points 
of 0 are fixed points of Pn{x — \7J{x)) [TU Corollary 1.3.5], i.e., zeros of the 
residual function 

r[x) ■.= X - Pn{x-V J{x)). (25) 

We denote the set of all these stationary points by 

S:={xGM.^\r{x)=0}, (26) 

and assume that S ^ We also assume that (I) has a solution and that 
J* := infa,gr 2 J{x). We will prove that sequences generated by a PSG method 
converge to a stationary point of o in the presence of bounded perturbations. 

We focus our attention on objective functions J(cc) of 0 that are assumed 
to belong to a subclass of convex functions, in the notation of [48] p. 65], 
J £ iS^’^(l7), which means that V J is Lipschitz continuous on 17 with Lipschitz 
constant L, i.e., there exists a L > 0, such that 

llVJ(a:) - VJ(2/)11 < LJIa:-yjj, for all a:, y £ 17, (27) 

and that J is strongly convex on 17 with the strong convexity parameter /a 
(L > /r), i.e., there exists a y > 0, such that 

J{v)>J{x) + {yJ{x),y-x) + ]^^l\\y-x\\'^, for all a;,y £ 17. (28) 
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The convergence of gradient methods without perturbations for this subclass 
of convex functions, is well-established, see [35] ■ 

Motivated by recent works on superiorization [T4l[T^l33] and the framework 
of feasible descent methods |3S], we investigate convergence of the PSG method 
with bounded perturbations for o, that is, 

= Paix’^ - rfcZ?(x'=)VJ(x'=) + e(x'=)), (29) 

where {rfej^g is a sequence of positive scalars with 

0 < inf Tfe < Tfe < supTfc < 2/L, (30) 

k k 

and {-D(ir^)}^o ^ sequence of diagonal scaling matrices. Denoting := 

e{x^), the sequence of perturbations is assumed to be summable, i.e., 

OO 

^ Ilex'll <+oo. (31) 

k=0 

To ensure that the scaled gradient direction does not deviate too much from 
the gradient direction, we define 

0^ ■.= VJ{x’^)-D{x'^)yj{x'^), (32) 

and assume that 

OO 

^ 110^11 <+oo. (33) 

fc =0 


3.1 Preliminary Results 

In this subsection, we prepare some relevant facts and pertinent conditions 
that are necessary for our convergence analysis. The following lemmas are 
required by subsequent proofs. The first one is known as the descent lemma 
for a function with Lipschitz continuous gradient, see jS] Proposition A.24]. 

Lemma 3.1 Let J : M” —>■ M fee a continuously differentiable function whose 
gradients are Lipschitz continuous with constant L. Then, for any L' > L, 

J{x) < J{y) + {'^J{y),x - y) + y\\x - for all x,y (34) 

The second lemma reveals well-known characterizations of projections onto 
convex sets, see, e.g., [BJ Proposition 2.1.3] or jSH Fig. 11]. 

Lemma 3.2 Let L2 be a nonempty, elosed and convex subset o/R". Then, the 
orthogonal projection onto 17 is characterized by 

(i) For any x G R”, the projection Pq{x) of x onto 17 satisfies 


{x - Po{x),y - Pn{x)) < 0, 'iy G L2. 


( 35 ) 
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(a) Pq is a nonexpansive operator, i.e., 

\\Po{x) - Pn{y)\\ < \\x - y\\, Vx, 2 /eR”. (36) 

The third lemma is a property of the orthogonal projection operator, which 
was proposed in Lemma 1], see also [SI Lemma 2.3.1]. 

Lemma 3.3 Let f2 be a nonempty, closed and eonvex subset o/R". Given 
X € R" and d G R”, the function ip(t) defined by 

V{t) := + (37) 

is monotonically nonincreasing for t > 0. 

The fourth lemma is from [461 Lemma 2.2], which originates from [TSl 
Lemma 2.1], see also [521 Lemma 3.1] or [^ p. 44, Lemma 2] for a more 
general formulation. 

Lemma 3.4 Let C R+ be a sequence of nonnegative real numbers. 

If it holds that 0 < cxfe+i < for all k > 0, where > 0 for all k > 0 

and ^ + 00 , then the sequence {afcj^Q eonverges. 

In our analysis we make use of the following two conditions, which are 
Assumptions A and B, respectively, in m, and are called “local error bound” 
condition and “proper separation of isocost surfaces” condition, respectively. 
The error bound condition estimates the distance of an x G 17 to the solu¬ 
tion set S, defined above, by the norm of the residual function, see [SI] for 
a comprehensive review. Denote the distance from a point x to the set S by 
d{x,S) = minxes l]x - y\\. 

Condition 1 For every v > J{x), there exist scalars £ > 0 and fi > 0 

such that 

d(x,S) </3\\r{x)\\ (38) 

for all X € 12 with J{x) < v and l|r(x)j] < e. 

The second condition, which says that the isocost surfaces of the function 
J (x) on the solution set S should be properly separated, is known to hold for 
any convex function gS] p. 161]. 

Condition 2 There exists a scalar e > 0 such that 

if u,v € S and J(u) fi- J{v) then jju — uj] > £. (39) 

Next, we show that the above two conditions are satisfied by functions 
belonging to S^'\{f2). Since Condition g] certainly holds for a strongly convex 
function, we need to prove that Condition |T] is also fulfilled. The early roots 
of the proof of the next lemma, which leads to this fact, can be traced back 
to Theorem 3.1 of [50] . 

Lemma 3.5 The error bound condition holds globally for any J G 5^’^ (17). 
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Proof By the definition of the residual function (113, we have 

X — r{x) = Pfi{x — VJ{x)) G n. (40) 

For any given x* € S, by the optimality condition of the problem m , see, e.g., 
[Ml p. 203, Theorem 3] or [3 Proposition 2.1.2], we know that 

{VJ{x*),x-x*)>o, Vxen. (41) 

Since x — r{x) G 1? for all x G 17, then, by dUD, we obtain, 

(-V J(x*), X - r(x) - X*) < 0. (42) 

From Lemma 13.21 (i) and (001), we get 

{{x - yj{x)) - Pn{x - VJ{x)),x* - Pn{x - VJ{x))) < 0 
=> ((x — VJ(x)) — (x — r(x)), X* — (x — r(x))) < 0 
=> (VJ(x) — r(x),X — r(x) — x*) < 0 

^ (VJ(x), X — r(x) — X*) < (r(x), x — r{x) — x*). (43) 

Summing up both sides of (l42ll and (l43|) - yields 

(VJ(x) — VJ(x*), X — r(x) — X*) < (r(x), x — r(x) — x*) 

^ (V J(x) - V J(x*), x-x*) < (r-(x), VJ(x) - VJ(x*) + x - x*). (44) 

By the strong convexity of T(x), we have that [JS] Theorem 2.1.9 ], 

(VJ(x) - VJ(x*),x-x*) > fi\\x-x*f. (45) 

Combing (l44ll with (1451) . leads to 

fi\\x — x*|p < (r(x), VJ(x) — VJ(x*) + X — X*) 

<(||VJ(x)-VJ(x*)|| + ||x-x1|)||r(x)|| 

< {L + l)||x - x*||||r(x)|| 

\\x-x*\\<{L + l)/n\\r{x)\\, (46) 

and, hence, 

d{x,S) <iL + l)/^I\\r{x)\\. (47) 

Consequently, the error bound condition (1551) . namely Condition |T] holds. 
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3.2 Convergence Analysis 

In this subsection, we give the detailed convergence analysis for the PSG 
method with bounded outer perturbations of (0^1) . The proof techniques fol¬ 
low the track of 1^^14411451146) and extend them to adapt to our case here. 
We first prove the convergence of the sequence of objective function values 
{J(a;^')}^g at points of any sequence generated by the PSG method 

with bounded outer perturbations of (1^ . We then prove that any sequence 
of points generated by the PSG method with bounded outer pertur¬ 

bations of (1^ . converges to a stationary point. 

The following proposition estimates the difference of objective function 
values between successive iterations in the presence of bounded perturbations. 

Proposition 3.1 Let fi C R" be a nonempty closed convex set and assume 
that J{x) is strongly convex on Q with convexity parameter p,, and that V J is 
Lipschitz continuous on Q with Lipschitz constant L such that L > p. Further, 
let he a sequence of positive scalars that fulfills iS0\} . let {e^}^g be 

a sequence of perturbation vectors as defined above that fulfills i31\} . and let 
(EP o,nd for which holds. If is any sequence, 

generated by the PSG method with bounded outer perturbations of then 
there exists an r/i > 0 such that 

J{x^) - - ||(5^||||x'“' - x^+^\\ (48) 

with defined via the above-mentioned and 9^, by 

:= -h (49) 

Tk 

Proof Lemma l3 .1 1 implies that 

J{x^) - J{x^+^) > {yj{x^),x^ - x^+^) - ^\\x^ - x^+^f. (50) 

By (Oni) and Lemma [221 we have 

(^fe+i _ _ TkD{x^)VJ{x'^) + e’^- x^+^) > 0. (51) 

Rearrangement of the last relation and using (15^ leads to 

Tk Tk 

+ (0^x'=-x'=+l). (52) 

By (1491) and the Cauchy-Schwarz inequality we then obtain 

(VJ(x'=),x^ - x'^+i) > — - ||(5^||||x'“' - x'^+^l (53) 

'^k 

Combining (IMl) with (l5(Tll leads to 

J{x'^) - J{x'^+^) >{—- ^)\\x^ - x’^+^f - ||<5'=||||a;'= - x'=+i||. (54) 

‘^k 2 
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By defining r := supr^ and 

k 

m-=^- (55) 

r z 

the proof is complete. 

From Proposition 13.II and Lemma 13.41 we obtain the following theorem on 
the convergence of objective function values. 

Theorem 3.1 If the problem ([7J1 has a solution, namely J* = J{x), 

then under the conditions of Proposition \3.1\. the sequence of function values 
{J(a;^')}^g calculated at points of any sequence generated by the 

PSG method with bounded outer perturbations of i29i) . converges. 

Proof From Proposition 13.11 we can further get 

Jix^) - m - x^+^\\ - , (56) 

and since J{x) > J*, for all a: £ 17, the above relation implies that 

0 < J(a;'=+i) -J*< J(x'=) -J* + —(57) 

4?7i 

By defining t := inf Tk and using Minkowski’s inequality, we get 

k 

P^||2< J_||efc||2+||0fc||2<^||gfc||2^||0.||2^ (58) 

n I- 

which implies, by (l!TT1) and (IMl) . that II< +oo. Then, by Lemma 

13.41 and (1^ . the sequence {J{x^) — converges, and hence the sequence 

{ J(a;^')}^g also converges. 

In what follows, we prove that any sequence, generated by the PSG method 
with bounded outer perturbations of (l29l) . converges to a stationary point of 
S. The following propositions lead to that result. The first proposition shows 
that —is bounded above by the difference between objective function 
values at corresponding points plus a perturbation term. 

Proposition 3.2 Under the conditions of Provo sition Wflf be any 

sequence generated by the PSG method with bounded outer perturbations of 
\2y\) . Let rji be given by i55\) and given by Then, it holds that 

||<,/J|j(x'=)-J(x^+i)|'/%-||^^||. 

\m ' m 


\\x^ - 


(59) 



14 


W. Jin, Y. Censor and M. Jiang 


Proof By the basic inequality {p + q)^ < 2{p^ + q‘^),\/p, g € R, we can write 

ll^fc _ < 2 . (60) 

From (IMl) and ((60ll . we have 

_ ^fe+i||2 < 1 ^ (61) 

Vi Vi 

which allows us to use the inequality + 6^ < a + 6, Va,6 > 0, yielding (I59F 

The next proposition gives an upper bound on the residual function of (l25ll 
in the presence of bounded perturbations. 

Proposition 3.3 Under the conditions of Provosition HOI is any 

sequence generated by the PSG method with bounded outer perturbations of 
WB, then there exists a constant 772 > 0 such that, for the residual function 
of (d^) we have, for all k > 0, 

lk(x^)||<^2(||x'=-x'=+i|| +lie'll! + lie'll!). (62) 

Proof From (l2^ . it holds true, by (1361) . that 

- Pnix^ - rfcn(x'=)VJ(x'=))|| < ||e'=||. (63) 

Then, we can get 

Wx’^ - Pnix'^ - nD{x’^)yJix'^))\\ 

< llx'^ - x’^+^W + - Poix’^ - TkD{x’^)yj{x^))\\ 

<\\x’^-x’^+^\\ +lie’ll (64) 

By Lemma ESI the left-hand side of (IMl) is bounded below, according to 
Wx'^ - Pn{x^ - TkD{x^)VJ{x^))\\ > nx^ - Pn{x^ - D{x^)VJ{x^))\\ (65) 
with f := nhn{l,infrfc} > 0. By (IMl) and (1551) . we then obtain 

k 

llx'^ - Pn{.x^ - n(x'=)VJ(x'=))|| < - x^+^\\ + Ilex'll). (66) 

r 

By the nonexpansiveness of the projection operator o, and the triangle 
inequality, we see that the residual function, defined by (E5|) . satisfies 

||r(a:'=)|| < ||x'= - Pn{x^ - D{x^)VJ{x^))\\ 

+ WPnix'^ - D{x’^)yj{x’^)) - Pn{x^ - VJ(^'=))|| 

< \\x>^ - Pn{x^ - D{x’^)yj{x’^))\\ + \\yj{x’^) - D{x’^)yj{x’^)\\ 
<l(||2,fc_^fc+i|| +lie'll!)+ 110^11), 

which, by choosing 772 '■= completes the proof. 


( 67 ) 
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The next proposition estimates the difference between the objective func¬ 
tion value at the current iterate and the optimal value. The proof is inspired 
by that of [45l Theorem 3.1]. 

Proposition 3.4 Under the conditions of Provosition HOI 
sequence generated by the PSG method with bounded outer perturbations of 
then there exists a constant 773 > 0 and an index > 0 such that for 
all k > K 3 

J(^fc+1) - J* < ,73 (||x" - + Ilex'll + lie'll!)" . ( 68 ) 

Proof Note that (ED) and ED imply that limfe_j.oo ||e^j| = 0 and linifc^oo || 0 '"'|| = 
0, respectively, hence, limfe_,.oo P^|| = 0. Then, Theorem 13.11 and Proposition 
13.21 imply that 

lim - 2 :'=+^|| = 0, (69) 

k—¥C!0 

and Proposition 13.31 shows that 

lim |lr(x^)|| = 0. (70) 

k—¥oo 

Condition [1] guarantees that there exist an index K 2 > Ki and a scalar /3 > 0 
such that for all k > K 2 


\\x'^-x^\\<m^'^)i ( 71 ) 

where G S' is a point for which d(x^, S) = ||x^ —The last two relations 
dZOl) and ED then imply that 

lim {x’^ - x'^) = 0, (72) 

k—¥oo 

and, using the triangle inequality and (IM)) . we get 

lim (x'^ - x'^'+i) = 0. (73) 

k—¥C!0 

In view of Condition [21 and since G S for all k > 0, (1731) implies that there 
exists an integer K 3 > K 2 and a scalar J°° such that 

J(x'=) = J“, for all k> K 3 . (74) 

Next we show that J°° = J*. For any k > K 3 , since x^ is a stationary point 
of J{x) over 17, it is true that 

(VJ(x'=),x-x'=) > 0, VxGI7. (75) 

From the optimality condition of constrained convex optimization 0 Propo¬ 
sition 2 . 1 . 2 ], we obtain that 

J{x) > J(x'=) = j°°, Vx G n. 


(76) 
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— TkD{x^)V J {x^) + e{x^) 



Fig. 1 An illustration of the geometric relationship between points and 


By the definition of J*, we have J{x) > J°° > J* for any x G 17, and hence 

J°° = J*. (77) 

If not, then J°° > J*, which means that J°° will be the infimum of J{x) over 
17 instead of J* and contradiction occurs. 

Since 17 is convex and is the projection of x^ — TfeZ7(x^)V J(x^) + onto 

17 (See Fig.[T]), by Lemma [321 (i), the following inequality holds 

{x^ - TkD{x^)y J{x^) + e{x'^) - x'^+\x'^+^ - x'^) >0, (78) 

and arrangement of the terms leads to 

(VJ(x'=),x'=+^ -x'^) 

< {9'^ + —e^ x'^+i - x’^) + —{x^- x'=+\ x^+i - x’^) 

'^k '^k 

< (ll^'^ll + ^l|e'=|| + \\\x^ - \\x^+^ - (79) 

where r := inf r^, as defined in (1581) . By using the mean value theorem again, 

k 

there is an x^ lying in the line segment between x^+^ and x^ such that 

J(x'=+^) - J(x'=) = (VJ(x'=), x'^+i - x'^). (80) 

Combining (Ei) and (|M|. yields, in view of (El) and ([77|. since we are looking 
at k > K 3 > K 2 > Ki, 

J(x'=+i) - J* 

= J(x'=+1) - J(x'=) 

= (VJ(x'=) - VJ(x'=),x'=+i - x'=) + (VJ(x'=),x'=+i - x^) 

< ||VJ(x'=) - VJ(x'=)||||x'=+i - x'^ll + (VJ(x'=),x'=+i - x^) 

< (^L||x^ - x'=|| + 110^11 + i||e'=|| + i||x^ - x'=+i||) ||x^+i - x'^ll. (81) 
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To finish the proof we further bound from above the right-hand side of m- 
For the term we note that is in the line segment between 

and thus, 

ll^fc+i _ = \\xk+i _ ^fc|| < ii^fc _ ^fe+i|| ^ ii^fe _ (32) 

which, when combined with 

- x'^W < llx'^ - x'^+^W + \\x^+^ - x'=||, (83) 

and 

11^'= -x^\\< llx'^ - x'=|| + ||x^' - x'^ll, (84) 

yields 

||x^' - x'^ll < ||x^' - x’^+^W + llx'^ - x'=||. (85) 

On the other hand, m and dia allows us to write 

</3?72(||a:'=-x'=+i|| +lie'll!+ ||0^'||), for all fc > Kg- (86) 

Thus, we have for the term i||x^ — x^\\, using (IR5l) and (IMl) . 

LWx^^ - x'=|| < L (||x'= - x'=+^|| + llx'^ - x'^ll) 

< L dix'^ - x'^+dl + Mx^^ - + Hell + lie'll!)) 

<L(l + /3i72)(lk'=-x'=+dl +Hell+ 110^11). (87) 

For the term — x^'|l in (ESI), we use the triangle inequality and (IMll to 

get 

llx'^+i - x'=H < Wx'^ - x'^+^w + Wx'^ - x'=H 

< llx'^ - x'=+i|| + PmiWx^ - ir'=+dl + He'll + Ill'll) 

<(l + /37y2)(lk'-i^'+'H +He'll + Ill'll)- (88) 

Finally, the term ||0^|| + —He'll + “H^i' — a:'^^|| in the right-hand side of (IRTll 

r T 

can also be bounded above by 

Ill'll + -He'll + -Ik' - ^'+MI < (1 + -)(lk' - + He'll + Ik'lD- (89) 

r r r 

Defining 

773 := (L + L/3772 + 1 +-)(1 +/3?72), (90) 

r 

and using all the bounds from above, i.e., (IRTll . ([55|), (IMll and (IH^ . we obtain 
J(x'=+k -J*<V3 (Ik' - ^'+'11 + He'll + Ik'H)' , for all k > (91) 


which completes the proof. 
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Combining Theorem l3.ll ProDOsition l3.2l and Proposition l3.41 it can be seen 
that limfc_^oo J{x^) = J*■ As an immediate application of the Proposition [231 
we get the following intermediate proposition that leads to the final result. 

Proposition 3.5 Under the eonditions of Provosition HOI is any 

sequence generated by the PSG method with bounded outer perturbations of 
- °° 

and if Xk ■= y J{x^) — J* for all k> 0, then ^ Xk < +oo. 

k=0 

Proof There exist real numbers 0 < 774 < 1 and 775 > 0 such that 

-J*< m^j{x^) -J*+ 775 (||e'=|| + lie'll!). (92) 

To prove this claim, we use (a + b)"^ < 2 {a^ + b"^) and (IB51) to get 

-r<r^, + lie'll! + lie'll!)" 

< 2773||x'=-a:'=+dP + 2773(||e'=|| + ||0^■||)^ (93) 

then apply (ICTl) .with added and subtracted J*, to obtain 

J(a:'=+^) -J* {J{x^) -r)-^ iJi.x’^^^) - r) + 

?7i ?7i m 

+ 2773(||e'=|| + ||0^||)^ (94) 


Rearranging terms yields 

fc+l\ r* ^ 


j(x'=+i )-r < 


(J(x^) - r) + 


2773 


-||<5 


fe ||2 


*71+4773^''' ' “ ^ ' 771(771+4773)' 

(I|e1l + ||0'^||)^. (95) 




m + 4773 

On the other hand, (l5^ leads to 

pfef < ^||efe||2 ^ ||^/c||2 < ^ ^||g7=||2 ^ ||0fe||2^ (gg) 

with r := inf Tk and f := min{l, inf Tfc} > 0 as defined earlier. Therefore, 


k 

^fc +1 


J(a;''+^) -J*< 


4773 


771 + 4773 


{j{x^) - r) 

2773 1 , 2771773 


.771(771 +4773) f2 771+4773 
Using Va + 6 < \/a + Vb gives 


(I|e*|l + ||e‘||)“. (97) 
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2??3 


mim + ' ?7i+4?73 


+ 


(lie'll!+ ||0'=||). (98) 


Denoting 774 := 


4773 


m + 4773 


and 775 := 


2??3 


1 , 2771773 . 

H-^—:—, we obtain 


? 7 i(» 7 i + 4773) 771+4773 


(EH) and, from the definition of 774 and the fact that 771 > 0,773 > 0, 

0 < 774 < 1. (99) 

It follows from ((11^ that 

Afe+i < 774Afc+ 775(||e'=|| + ||0'=||). (100) 


Then, for all M > N, 

M M-1 

^ Afc = ^ Afc+i 

k=N+l k=N 

M-1 M-1 

<^4 E E (Ill'll + Ill'll) 

k^N k^N 

M M 

< m^N + V4: 

k=N+l k^N 

Consequently, 

M M 

^ -1 + 1 -C ^ 

fe=Ar+i fc=Ar 

And hence, 

00 00 

-1 ZC +1 

fc=Af+i fe=Ar 

The proof now follows by (El]), (IHHll . 

Finally, we are ready to prove that sequences generated by the PSG method 
with bounded outer perturbations of (12911 converge to a stationary point in S. 
We do this by combining ProDOsition l3.2l ProDOsition l3.3l and Proposition l3.5l 

Theorem 3.2 Under the conditions of Provosition mi if {x^}^q is any se¬ 
quence generated by the PSG method with bounded outer perturbations of /fA9l) . 
then it converges to a stationary point of the problem m, ^■e. to a point in S. 


( 102 ) 


(103) 
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Proof Obviously, 

Ij(a;'=) - < (I- r\ + \j{x'^+^) - 

< Afe + Afc+i, (104) 


which implies, by ProDOsition l3.51 that 

OO 

<+oo. (105) 

k^O 

This, along with Proposition 13.2[ guarantees that 

OO 

<+oo, (106) 

k^O 

which implies that the sequence generated by (EH)-® converges. 

Denoting x* := limfc^oo and using Proposition 13.31 we get from (15^ that 
||r(a;*)|| = 0, i.e., x* G S, and the proof is complete. 


4 Bounded Perturbation Resilience of PSG Methods 

In this section, we prove the bounded perturbation resilience (BPR) of PSG 
methods. This property is fundamental for the application of the superior- 
ization methodology (SM) to them. We do this by establishing a relationship 
between BPR and bounded outer perturbations given by 


4.1 Bounded Perturbation Resilience 

The superiorization methodology (SM) of [UlfTKlIllH is intended for nonlinear 
constrained minimization (CM) problems of the form: 

minimize {())(x) \ x G^} , (107) 

where (p : R" —>■ R is an objective function and 'F C R” is the solution set of 
another problem. The set \F could be the solution set of a convex feasibility 
problem (CFP) of the form: find a vector x* G F := where the sets 

Ci C R" (1 < i < /) are closed convex subsets of the Euclidean space R", 
see, e.g., [TEH® or [TH Chapter 5] for results and references on this broad 
topic. In such a case we deal in (110711 with a standard CM problem. Here we 
are interested in the case wherein W is the solution set of another CM, namely 
the one presented at the beginning of the paper, 

minimize {T(a;) \ x G , (108) 


i.e., we wish to look at, 


F := {a;* G 17 | J{x*) < J{x) for all a; £ 17} 


(109) 
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assuming that W is nonempty. 

In either case, or any other case of the set If", the SM strives not to solve 
(flOTl) but rather the task is to find a point in W that is superior (i.e., has a 
lower, but not necessarily minimal, value of the (j) objective function value) to 
one returned by an algorithm that solves (11081) alone. This is done in the SM by 
first investigating the bounded perturbation resilience of an algorithm designed 
to solve (110811 and then proactively using such permitted perturbations in order 
to steer the iterates of such an algorithm toward lower values of the (j) objective 
function while not loosing the overall convergence to a point in See mnii 
133] for details of the SM. A recent review of superiorization-related previous 
work appears in m Section 3]. 

In this paper we do not perform superiorization of any algorithm. Such 
superiorization of the EM algorithm with total variation (TV) serving as the 
(j) objective function and an application of the approach to an inverse problem 
of image reconstruction for bioluminescence tomography will be presented in 
a sequel paper. Our aim here is to pave the way for such an application by 
proving the bounded perturbation resilience that is needed in order to do 
superiorization. 

For technical reasons that will become clear as we proceed, we introduce 
an additional set 0 such that 'F C 0 C M" and assume that we have an 
algorithmic operator : R" —> 0, that defines a Basic Algorithm as follows. 

Algorithm 4.1 The Basic Algorithm 

Initialization: G 0 is arbitrary; 

Iterative Step: Given the current iterate vector , calculate the next 
iterate x^~^^ by 

x^+^ = A^ (x^) . (110) 

The bounded perturbation resilience (henceforth abbreviated by BPR) of 
such a basic algorithm is defined next. 

Definition 4.2 Bounded Perturbation Resilience (BPR) An algorith¬ 
mic operator A^ : R" —>■ 0 is said to be bounded perturbations resilient if the 
following holds. If Algorithm 14.11 generates sequences {a;^}^Q with x^ G 0, 
that converge to points in W, then any sequence starting from any 

G 0, generated by 

2 /'=+! = A^ + /3fcn'=) , for all fc > 0, (111) 

where (i) the vector sequence bounded, and (ii) the scalars {/?fc}^o 

are such that Pk>0 for all A: > 0, and ^ (***) ^ ^ 

for all k > 0, also converges to a point in W. 

Comparing this definition with [141 Definition 1], [33l Subsection II.C] and 
[m Definition 4.2], we observe that {in) in Definition 14.21 above is needed 
only if 0 ^ R”. In that case, the condition (Hi) of Definition 14.21 above is 
enforced in the superiorized version of the basic algorithm, see step (xiv) in 
the “Superiorized Version of Algorithm P” in m p. 5537] and step (14) in 
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“Superiorized Version of the ML-EM Algorithm” in Subsection II.B]. This 

will be the case in the present work. 

An important special case, from which the superiorization methodology 
originally grew and developed, is when is the solution set of the (linear) 
convex feasibility problem and is a string-averaging projection method. 
This was discussed and experimented with for problems of image reconstruc¬ 
tion from projections wherein the function ^ of HMD was the total variation 
(TV) of the image vector x, see [8ll24] . 

Note also that in later works usm the notion of BPR was replaced by 
that of strong perturbation resilience which caters to situations where •f' might 
be empty, however we still work here with the above asymptotic notion of 
BPR and assume that W is nonempty. Treating the PSG method as the Basic 
Algorithm A^, our strategy was to first prove convergence of the PSG iterative 
algorithm with bounded outer perturbations, i.e., convergence of 

= Pa{x^ - TkD{x'^)yj{x'^) + e^). ( 112 ) 

We show next how the convergence of this yields BPR according to Definition 
14.21 Such a two steps strategy was also applied in [8l p. 541]. 

A superiorized version of any Basic Algorithm employs the perturbed ver¬ 
sion of the Basic Algorithm as in (11111) . A certificate to do so in the superi¬ 
orization method, see [T3], is gained by showing that the Basic Algorithm is 
BPR (or strongly perturbation resilient, a notion not discussed in the present 
paper). Therefore, proving the BPR of an algorithm is the first step toward 
superiorizing it. This is done for the PSG method in the next subsection. 


4.2 The BPR of PSG Methods as a Consequence of Bounded Outer 
Perturbation Resilience 

In this subsection, we prove the BPR of the PSG method whose iterative 
step is given by ([H]). To this end we treat the right-hand side of ([5]) as the 
algorithmic operator of Definition 14.21 namely, we define for all A: > 0, 

{x'^) := Poix'^ - TkD{x'^)yj{x'^)), (113) 

and identify the solution set tp' there with the set S of (IMll . and identify the 
additional set 0 there with the constraint set 17 of (Pi- 

According to Definition 14.21 we need to show convergence of any sequence 
thatj starting from any x^ £ 17, is generated by 

= Pn {{x'^ + Pkv'^) - tuD{x'^ + Pkv'^)VJ{x^ + Puv'^)) , (114) 

for all fc > 0, to a point in S of (l26ll . where obey the 

conditions {i) and {ii) in Definition l4.21 respectively, and also {Hi) in Definition 
14.21 holds. 

The next theorem establishes the bounded perturbation resilience of the 
PSG methods. The proof idea is to build a relationship between BPR and the 
convergence of PSG methods with bounded outer perturbations of (Pi-dH). 
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We caution the reader that we introduce below the assumption that the set 
17 is bounded. This forces us to modify the problems ([HI) and (ITRl) by replacing 
l7o with some bounded subset of it in order to apply our results. While this is 
admittedly a mathematically weaker result than we hoped for, we note that 
this would not be a harsh limitation in practical applications wherein such 
boundedness can be achieved from problem-related practical considerations. 

Theorem 4.1 Given a nonempty closed convex and bounded set 17 C M"", 
assume that J £ iS^’^(l7) (i.e., J obeys and h2H[) ) and there exists at least 
one point xq £ 17 such that ||VJ(a:i 7 )|| < -boo. Let be a sequence of 

positive scalars that fulfills iSOD . {i7(x)}^g be a sequence of diagonal scaling 
matrices that is either of form 1 1 61) or {2Jf^ , and let 
for which Iggll holds. Under these assumptions, if the vector sequence 
is bounded and the scalars {/3fc}^Q are such that (3k > 0 for all k > 0, and 
< oo, then, for any £ 17, any sequence {x^}^q, generated by 
ill4^ such that G 17 for all k > 0, converges to a point in S of Ii26[) . 

Proof The proof is in two steps. For the first step, we build a relationship 
between dm and bounded outer perturbations of @-0. For the second 
step, we invoke Theorem 13.21 and establish the convergence result. 

Step 1. We show that any sequence generated by (I114D satisfies 

= Pn - TkD{x^)^J{x^) + e^) , (115) 

with Er=o lie'll! < -boo. Since 17 is a bounded subset of M", there exists a 
tq > Q such that 17 C B{xa, ro), where B{xa, r^) C R" is a ball centered at 
XQ with radius xq. Then, for any a; £ 17, 

\\x - xa\\ <ra ^ ||a;|| < ||xi7||-b rj7. (116) 

The Lipschitzness of V J(a:) on 17 and (11161) imply that, for any x G SI, 

||VJ(x) - VJ(xr 2 )|| < L\\x - xr^ll =b ||VJ(x)|l < ||VJ(xr 2 )|| + Lra. (117) 

Since the sequence {a;^}^g generated by (I114|) is contained in 17, due to the 
projection operation Pq, and x^ -{-(duv^ is also in 17, it holds that, for all fc > 0, 
x^ and x^ -b (dkV^ satisfy (I116L and that V J(x^) and V J{x^ + (3kV^) satisfy 
(I117F Besides, the boundness of {v’‘}k^Q implies that there exist a U > 0 such 
that llti^'ll < F for all fc > 0. Therefore, we have 

\\(3kv'^\\<v(3k- (118) 

From (I114L the outer perturbation term of (11151) is given by 
e^- = (x^ + Pkv'^ - TkD{x^ + PkV^)VJ{x^ + Pkv'^)) - {x^ - rfei7(a:'=)V J(a:'=)) 
= + Tk (77(x'=)V J(x'=) - D{x’^ + /3fcu'=)V J(x'= + /3fcu'=)) . (119) 

Given that D{x) is either of form (fTOl) or ()24L we consider them separately. 
In what follows, we repeatedly use the fact that ||Ai7a:|l < ||Ai7||i?||a;|l < 
IIA||i7’||71||p||a;|| for any A,B G R”^" and x G M", with || • the Frobenius 
norm of matrix, see, e.g., PRl Section 2.3]. 
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{i) Assume that D{x) is of form (|16L namely that D{x) = Uls for any x. For 
this case, combining (11191) with (l?7l) . (1!^ and (I118L and by the Minkowski 
inequality, we get 

Ilex'll = \\/3kv’‘ + TfeZ^Ls {yj{x’‘) - + Pkv’^)) II 

< IWkv’^W + rfe||nLs||F||VJ(x'^) - VJ{x'^ + ^kv'^)\\ 

< ||/3fc^;'=||+rfe||nLs||FA||^fei;1| 

< (l + 2 ||nLs||F)^^fe. ( 120 ) 

(a) Assume that D{x) is of form (l24ll . namely that D{x) := DX with D — 
diag{l/sj} and X = diagla;^} diagonal matrices. In this case, combining 
(11191) with (l?7l) . (I5ni) . (I116D . (11171) . (11181) . and by the Minkowski ineqnality, 
we get 

I|e1| = WkV^ + Tfe {D{x'^)VJ{x'^) - D{x^ + (3kv'^)VJ{x^ + || 

= ||/3feu'= + Tfe {D{x^)VJ{x^) - D{x'^ + PkV^)VJ{x^)) 

+ Tfc {D{x^ + PkV^)VJ{x^) - Dix’^ + Pkv'^)VJ{x^ + Puv^)) || 

< WkV^W + Tfc||.D(X'= - X'=)VJ(a;'=)|| 

+ rfc||nx'=(VJ(x'=) - V J(x'= + I3kv'^))\\ 

< IWkv’^W + Tfellnllf 

< (1 + r,||n||^||VJ(x'=)|| + r,A||I)||^||l'=||^)||/3,u'=|| (121) 

< (1 + 2||n||;^||VJ(x'=)||/L + 2||n||f ||a:'= + /3kV>^\\)vf5k (122) 

< (l + 2||I)||;^(||VJ(xr2)||/L + ||xr3|| + 2rn)) vh, (123) 

where X^ := diagja;^}, X^ := diag{(a;^ + /3fcU^)j}, and (11211) holds by the 
fact that \\X^ — X^\\f = ||a;^ — (a:^ + /3fcV^)|| = ||/3fcU^'||, and (11221) holds 
since 11X^11^ = ||a:^ + /3feU^'||, and (11231) holds by (11161) and (11171) . 

Defining a constant 


Cf? := ■!; + 2 u- max|||DLs||F, ||i)||F(||VJ(xf3)||/A + ||a;r2|| + 2rr2)| , (124) 
and considering (11201) or (I123p . yields that in either case (i) or case (li), 

lie'll! < Cofik- (125) 

Then, Yl,T=o implies that J2T=o ll^^ll < +°°' 

Step 2. Under the given conditions, by invoking Theorem 13.21 we know 
that, for any x^ G 17, any seqnence {a;^}^g, generated by (11151) in which 
EZoWe^W < + 00 , converges to a point in S of (l26l) . Hence, the sequence 
generated by (11141) also converges to the same point of S. 
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